Offspring-annotated probabilistic context-free grammars
نویسندگان
چکیده
This paper describes the application of a new model to learn probabilistic context-free grammars (PCFGs) from a tree bank corpus. The model estimates the probabilities according to a generalized -gram scheme for trees. It allows for faster parsing, decreases considerably the perplexity of the test samples and tends to give more structured and refined parses. In addition, it also allows several smoothing techniques such as backingoff or interpolation that are used to avoid assigning zero probability to any sentence.
منابع مشابه
Studying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملProbabilistic Unification Grammars
Recent research has shown that unification grammars can be adapted to incorporate statistical information, thus preserving the processing benefits of stochastic context-free grammars while offering an efficient mechanism for handling dependencies. While complexity studies show that a probabilistic unification grammar achieves an appropriately lower entropy estimate than an equivalent PCFG, the ...
متن کاملEstimation of Consistent Probabilistic Context-free Grammars
We consider several empirical estimators for probabilistic context-free grammars, and show that the estimated grammars have the so-called consistency property, under the most general conditions. Our estimators include the widely applied expectation maximization method, used to estimate probabilistic context-free grammars on the basis of unannotated corpora. This solves a problem left open in th...
متن کاملThe Generative Power of Probabilistic and Weighted Context-Free Grammars
Over the last decade, probabilistic parsing has become the standard in the parsing literature where one of the purposes of those probabilities is to discard unlikely parses. We investigate the effect that discarding low probability parses has on both the weak and strong generative power of context-free grammars. We prove that probabilistic context-free grammars are more powerful than their non-...
متن کاملProbabilistic Context-Free Grammars for Syllabification and Grapheme-to-Phoneme Conversion
We investigated the applicability of probabilistic context-free grammars to syllabi cation and grapheme-to-phoneme conversion. The results show that the standard probability model of context-free grammars performs very well in predicting syllable boundaries. However, our results indicate that the standard probability model does not solve grapheme-to-phoneme conversion su ciently although, we va...
متن کامل